Multinomial Flight Modeling

Flight Trials Winter 2020 were conducted from 2/17/2020 - 3/10/2020. Soapberry bugs were flight tested twice (T1 vs. T2) for multiple hours in the flight mill and observed from 8 AM to (5-8 PM) each day. I used multicategorical, nominal modeling (multinom) to analyze how sex and host plant as well as changes in mass and egg-laying changed a soapberry bug’s flight response between trials.

Delta Flight Response (T1 vs. T2)

Introduction

Clean the Data

Flight Case ~ Mass Diff

Flight Case ~ Mass Per

Flight Case ~ Mass Diff or Mass Perc + Sex

Flight Case ~ Mass Diff or Mass Perc + Sex + Host Plant

Flight Case ~ Mass Diff or Mass Perc + Sex + Wing2Body

Females Only

Introduction

Graphs I generated below illustrate how changes in mass (more specifically % mass) or egg-laying led to changes in flight response. To generate these graphs, I ran multicategorical regressions to model the probability of different flight delta cases due to sex, host plant, and changes in mass or egg-laying. Changes in speed and distance as a result of changes in mass or egg-laying were not analyzed here but rather in a separate script.

To perform these analyses, a new variable will be created called “flight_case” which calculates the flight response differences between T1 and T2. Here is a formula and key describing each flight case event and encoding:

Formula : flight case = response in \(T_2\) - response in \(T_1\)

Delta Flight Response Key
Event Encoding
flew in both trials 2
flew in T2 only 1
flew in neither trials 0
flew in T1 only -1

Since the outcomes (or response variables) were no longer binomial, I used multicategorical logit models (refer to Chapter 6 of An Introduction to Categorical Data Analysis, Second Edition by Alan Agresti). Below I’ve written out notes on performing these analyses.

What we are interested in is a multicategorical, nominal response variable. When the response variable is nominal, there is no natural order among the response variable categories (unordered categories). When the response variable is multicategorical, its multicategory models assume that the counts in the categories of \(Y\) have a multinomial distribution.

Let J = number of categories for \(Y\).

\(\pi_1,...\pi_J\) = the response probabilities where \(\sum_j \pi_j = 1\).

With \(n\) independent observations, the probability distribution for the number of outcomes of the J types is the multinomial. It specifies the probability for each possible way the \(n\) observations can fall in the J categories. Multicategory logit models simultaneously use all pairs of categories by specifying the odds of outcome in one category instead of another.

Logit models for nominal response variables pair each category with a baseline category. The choice of the baseline category is arbitrary. When the last category (J) is the baseline, the baseline-category logits are

\(\log (\frac{\pi_j}{\pi_J}), j = 1, ..., J - 1\).


Given that the response falls in category j or category J, this is the log odds that the response is j. For J = 3, for instance, the model uses \(\log(\pi_1/\pi_3)\) and \(\log(\pi_2/\pi_3)\). The baseline-category logit model with a predictor x is

\(\log \frac{\pi_j}{\pi_J} = \alpha_J + \beta_j x, j = 1, ..., J - 1\)


The model has J - 1 equations, with separate parameters for each. The effects vary according to the category paired with the baseline. When J = 2, this model simplifies to a single equation for \(\log(\pi_1/\pi_2) = logit(\pi_1)\), resulting in ordinary logistic regression for binary responses.

So how do these pair of categories determine equations for all other pairs of categories? Here is an arbitrary pair of categories a and b that follows the general equation above,

\(\log (\frac{\pi_a}{\pi_b}) = \log ( \frac{\pi_a/pi_J}{\pi_b/pi_J}) = \log (\frac{\pi_a}{\pi_J}) - \log (\frac{\pi_b}{\pi_J})\)

\(= (\alpha_a + \beta_a x) - (\alpha_b + \beta_b x)\)

\(= (\alpha_a - \alpha_b) + (\beta_a - \beta_b) x\)


So, the equation for categories a and b has the form \(\alpha + \beta x\) with intercept parameter \(\alpha = (\alpha_a - \alpha_b)\) and with slope parameter \(\beta = (\beta_a - \beta_b)\).

Let’s apply it to our data now:

Cleaning the Data

output_col = FALSE # Change to TRUE if working in Base R or RStudio; FALSE if generating an HTML
source("src/clean_flight_data.R") # Loads and cleans data
source("src/regression_output.R") # Cleans regression outputs; prints in color or black & white
source("src/center_flight_data.R") # Re-centers data
source("src/get_warnings.R")

data <- read_flight_data("data/all_flight_data-Winter2020.csv")
data_all <- data[[1]]
data_tested <- data[[2]]
source("src/unique_flight_data.R")
d <- create_delta_data(data_tested)
colnames(d)[c(1:2,5,66:68,71:73)]
## [1] "ID"          "sex"         "host_plant"  "egg_diff"    "mass_diff"  
## [6] "flew_diff"   "mass_per"    "flight_case" "egg_case"

Age Effect

Age also plays a role in this dataset, but age was unknown because the soapberry bugs were field collected.

Modeling

Below I used the multinom function from the nnet package to estimate a multinomial logistic regression model. There are other functions in other R packages capable of multinomial regression. I chose the multinom function because it does not require the data to be reshaped (as the mlogit package does) and to mirror the example code found in Hilbe’s Logistic Regression Models.

First, I chose the baseline by specifying the level using relevel function. Then, I ran our model using multinom. The multinom package does not include p-value calculations for the regression coefficients, so I calculated P using Wald tests (i.e. z-tests).

Finally, I considered mass difference as my predictor variable where a (+) sign means the bug gained mass between trials and a (-) sign means the bug lost mass between trials.

Delta Mass Key
Event Sign
gained mass from T1 to T2 pos
no mass change between trails 0
lost mass from T1 to T2 neg

Baseline

df <- d %>%
  filter(!is.na(mass_diff), !is.na(flight_case)) 
  
df <- df[with(df, order(mass_diff)),]
n_trials = nrow(df)

df$flight_case <- relevel(as.factor(df$flight_case), ref = "0")

Null model

null <- multinom(flight_case ~ 1, data = df) 
## # weights:  8 (3 variable)
## initial  value 385.389832 
## iter  10 value 319.269929
## final  value 319.269680 
## converged

Multinom model: flight_case ~ mass_diff

model <- multinom(flight_case ~ mass_diff, data = df) 
## # weights:  12 (6 variable)
## initial  value 385.389832 
## iter  10 value 317.510452
## iter  20 value 310.451788
## iter  30 value 308.978871
## iter  40 value 308.544941
## iter  50 value 308.493264
## iter  60 value 308.481503
## iter  70 value 308.459043
## iter  80 value 308.445007
## iter  90 value 308.440565
## final  value 308.439954 
## converged
model_table = calculate_P(model)
## 
##  AIC:  628.8799 
##    (Intercept)  Estimate DF Std. Err.     z   wald P > |z|
## -1      -0.770    69.211  6    16.478 4.200 17.641   0.000
## 1       -2.129     6.699  6    28.958 0.231  0.054   0.817
## 2        0.420    25.328  6    12.662 2.000  4.002   0.045
anova(null, model, test="Chisq") # adding mass_diff improves fit 
## Likelihood ratio tests of Multinomial Models
## 
## Response: flight_case
##       Model Resid. df Resid. Dev   Test    Df LR stat.      Pr(Chi)
## 1         1       831   638.5394                                   
## 2 mass_diff       828   616.8799 1 vs 2     3 21.65945 7.678753e-05

Which multinomial model equations are significant:

get_significant_models()

The multinomial (ML) prediction equations are:

prediction_equations(model_table) 
## [1] "log(pi_-1 / pi_1) = 1.36 + 62.51 Mass Change     Flew in T1, rather than T2"   
## [2] "log(pi_2 / pi_-1) = 1.19 + -43.88 Mass Change     Flew in both, rather than T1"
## [3] "log(pi_2 / pi_1) = 2.55 + 18.63 Mass Change     Flew in both, rather than T2"
N Model P
278 Flew in T1 only rather than T2 only = 1.36 + 62.51 Mass Change not sig
278 Flew in both rather than T1 only = 1.19 - 43.88 Mass Change sig
278 Flew in both rather than T2 only = 2.55 + 18.63 Mass Change not sig

Estimated Odds

For bugs of mass_diff x + 1/50 (0.02) g, the estimated odds that the bug flew twice rather than only in T1 equals

exp(-43.88/50)
## [1] 0.4157796

times the estimated odds at mass_diff x (g). As gain mass, then less likely to fly again.

exp(coef(model)/50) # this compares to not flying, the baseline | 1/0 not significant
##    (Intercept) mass_diff
## -1   0.9847154  3.991693
## 1    0.9583206  1.143376
## 2    1.0084255  1.659584

Predicted probabilities

You can also use predicted probabilities to help you understand the model. The multicategory logit model has an alternative expression in terms of the response probabilities. This is

\(\pi_j = \frac{e^{\alpha_j + \beta_j x}}{\sum_he^{\alpha_h + \beta_h x}}, j=1,...,J\)


The denominator is the same for each probability, and the numerators for various j sum to the denominator where \(\sum_j \pi_j = 1\). The parameters (\(\alpha\) and\(\beta\)) equal zero for whichever category is the baseline in the logit expressions. So these would be the equations for the model above:

\(\hat{\pi_{-1}} = \frac{e^{-1.77 + 69.21 x}}{1 + e^{-1.77 + 69.21 x} + e^{-2.13 - 6.70 x} + e^{0.42 + 25.33 x}}, j=1,...,J\)

\(\hat{\pi_1} = \frac{e^{-2.13 - 6.70 x}}{1 + e^{-1.77 + 69.21 x} + e^{-2.13 - 6.70 x} + e^{0.42 + 25.33 x}}, j=1,...,J\)

\(\hat{\pi_2} = \frac{e^{0.42 + 25.33 x}}{1 + e^{-1.77 + 69.21 x} + e^{-2.13 - 6.70 x} + e^{0.42 + 25.33 x}}, j=1,...,J\)

\(\hat{\pi_0} = \frac{e^{0 + 0 x} = 1}{1 + e^{-1.77 + 69.21 x} + e^{-2.13 - 6.70 x} + e^{0.42 + 25.33 x}}, j=1,...,J\)

x = mass_diff, which can give you the estimated probabilities.


You can calculate these predicted probabilities for each of our outcome levels using the fitted function. You can start by generating the predicted probabilities for the observations in our dataset and viewing the first few rows. Then, you can plot them.

head(pp <- fitted(model))
##           0         -1          1         2
## 1 0.6268782 0.01288591 0.05518211 0.3050538
## 2 0.5919733 0.01843237 0.05424679 0.3353476
## 3 0.5859452 0.01955212 0.05405531 0.3404474
## 4 0.5612480 0.02470149 0.05318314 0.3608674
## 5 0.5485567 0.02772717 0.05268168 0.3710345
## 6 0.5356441 0.03109397 0.05213548 0.3811264

Multinom model: flight case ~ mass_per

Re-order

df <- df[with(df, order(mass_per)),]
model <- multinom(flight_case ~ mass_per, data = df) 
## # weights:  12 (6 variable)
## initial  value 385.389832 
## iter  10 value 307.053645
## final  value 306.984778 
## converged
model_table = calculate_P(model)
## 
##  AIC:  625.9696 
##    (Intercept)  Estimate DF Std. Err.     z   wald P > |z|
## -1      -0.909     0.044  6     0.010 4.492 20.179   0.000
## 1       -2.132     0.001  6     0.020 0.043  0.002   0.965
## 2        0.344     0.021  6     0.008 2.494  6.221   0.013

Model comparing flying only in T2 to the baseline (not flying at all) is not significant (P = 0.97). That is probably because so few bugs (only male) flew in T2 only.

Significant ML equations

run_multinom_model = function(d) {
  m <- multinom(flight_case ~ mass_per, trace=FALSE, data = d) 
  return(m)
}
get_significant_models()

The multinomial (ML) prediction equations are:

gsub("Mass Change",  "Mass Percent Change", prediction_equations(model_table))
## [1] "log(pi_-1 / pi_1) = 1.22 + 0.04 Mass Percent Change     Flew in T1, rather than T2"   
## [2] "log(pi_2 / pi_-1) = 1.25 + -0.02 Mass Percent Change     Flew in both, rather than T1"
## [3] "log(pi_2 / pi_1) = 2.48 + 0.02 Mass Percent Change     Flew in both, rather than T2"
N Model P
278 Flew in T1 only rather than T2 only = 1.22 + 0.04 Mass % sig
278 Flew in both rather than T1 only = 1.25 - 0.02 Mass % sig
278 Flew in both rather than T2 only = 2.48 + 0.02 Mass % not sig

Estimated Odds

For bugs of mass_perc x + 20%*(0.04) the estimated odds that bug flew in T1 rather than T2 equals

exp(0.04*20)
## [1] 2.225541

times the estimated odds at mass_perc x (%). So for every 20% increase in a bug’s original mass, the bug is 2.23 times more likely to fly only in T1. That makes sense because the odds of flying in T2 only are very low to begin with.

For bugs of mass_perc x + 40%*(-0.02), the estimated odds that the bug flew twice instead of only in T1 equals

exp(40*-0.02) # 2.5 times less likely to fly twice if gain mass
## [1] 0.449329
exp(-40*-0.02) # 2.5 times more likely to fly twice if loose mass
## [1] 2.225541

times the estimated odds at mass_perc x (%). Similar message - for every 40% increase in a bug’s original mass, the bug was 2.23 times more likely to fly only in T1 rather than fly twice. Although, when a bug looses 40% of original mass then becomes more likely to fly twice then only once.

exp(coef(model)*20) # this compares to no fly, the baseline # gain large % mass
##     (Intercept) mass_per
## -1 1.264064e-08 2.415238
## 1  3.047172e-19 1.017407
## 2  9.660792e+02 1.508336
exp(coef(model)*5) # gain small % mass
##     (Intercept) mass_per
## -1 1.060333e-02 1.246637
## 1  2.349493e-05 1.004324
## 2  5.575107e+00 1.108216
exp(coef(model)*-5) # loose small % mass
##     (Intercept)  mass_per
## -1 9.431000e+01 0.8021582
## 1  4.256237e+04 0.9956950
## 2  1.793688e-01 0.9023509
exp(coef(model)*-20) # loose large % mass
##     (Intercept)  mass_per
## -1 7.910994e+07 0.4140378
## 1  3.281732e+18 0.9828910
## 2  1.035112e-03 0.6629821
# 1/0 not significant

1 vs. 0 is not significant as already stated. However, -1 vs. 0 shows that as gain more mass then become increasingly more likely to only fly once. If loose mass then most likely to not fly at all, and more likely to fly twice then fly only once.

head(pp <- fitted(model))
##           0         -1          1         2
## 1 0.5386483 0.04284419 0.06190317 0.3566043
## 2 0.5356362 0.04375052 0.06158899 0.3590243
## 3 0.5290568 0.04577594 0.06090110 0.3642662
## 4 0.5263205 0.04663701 0.06061436 0.3664282
## 5 0.4982068 0.05614792 0.05764729 0.3879980
## 6 0.4949167 0.05734314 0.05729764 0.3904425

For more on multinomial plotting for when have more than 1 predictor variables see the following: https://stats.idre.ucla.edu/r/dae/multinomial-logistic-regression/

Now let’s compare some models

First: flight_case, mass_diff, and sex

df <- df[with(df, order(mass_diff)),]
data <- data.frame(R = df$flight_case, 
         A = df$mass_diff,
         B = df$sex_c)
source("src/compare_models.R")
model_comparisonsAIC("src/generic multinomial models- multinom 1RF + 2 FF.R")
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
##        [,1]      [,2]     
## AICs   589.7091  593.7062 
## models 3         4        
## probs  0.8804523 0.1193253
## 
## m3   multinom(formula = R ~ A + B, data = data, trace = FALSE)
## m4   multinom(formula = R ~ A * B, data = data, trace = FALSE)
anova(m3, m4, test="Chisq") # Adding A*B does not improve fit
## Likelihood ratio tests of Multinomial Models
## 
## Response: R
##   Model Resid. df Resid. Dev   Test    Df LR stat.   Pr(Chi)
## 1 A + B       825   571.7091                                
## 2 A * B       822   569.7062 1 vs 2     3 2.002834 0.5718186
delta_mass_model <- multinom(flight_case ~ mass_diff + sex_c, data = df)
## # weights:  16 (9 variable)
## initial  value 385.389832 
## iter  10 value 297.481645
## iter  20 value 287.067261
## iter  30 value 286.119187
## iter  40 value 285.989229
## iter  50 value 285.976242
## iter  60 value 285.937258
## iter  70 value 285.917216
## iter  80 value 285.877940
## iter  80 value 285.877937
## iter  90 value 285.874179
## iter 100 value 285.854534
## final  value 285.854534 
## stopped after 100 iterations
model_table = calculate_P2(delta_mass_model, "mass_diff", "sex_c")
## 
##  AIC:  589.7091 
##    (Intercept) mass_diff   sex_c DF    SE1   SE2     z1      z2 ML wald
## -1      -0.965    70.907  -0.788  9 17.662 0.219  4.015  -3.590  16.117
## 1      -17.461   -27.383 -16.279  9 67.108 0.184 -0.408 -88.332   0.167
## 2        0.205    20.417  -0.885  9 12.907 0.157  1.582  -5.626   2.502
##    sex wald P1 > |z| P2 > |z|
## -1   12.886    0.000        0
## 1  7802.495    0.683        0
## 2    31.647    0.114        0

Significant models

get_significant_models(11) # mass_diff

get_significant_models(12) # sex

prediction_equations2(model_table, " Mass Change", " Sex ")
## [1] "Where F = 1"
## [1] "log(pi_-1 / pi_1) = 16.5 + 98.29 Mass Change + 15.49 Sex      Flew in T1, rather than T2"  
## [2] "log(pi_2 / pi_-1) = 1.17 + -50.49 Mass Change + -0.1 Sex      Flew in both, rather than T1"
## [3] "log(pi_2 / pi_1) = 17.67 + 47.8 Mass Change + 15.39 Sex      Flew in both, rather than T2"
N Model P
278 Flew in T1 only rather than T2 only = 16.5 + 98.29 Mass Change + 15.49 Sex not mass | sex**
278 Flew in both rather than T1 only = 1.25 - 50.49 Mass Change - 0.1 Sex mass** | not sex
278 Flew in both rather than T2 only = 17.67 + 47.8 Mass Change + 15.39 Sex not mass | sex**

Second: flight_case, mass_per, and sex

df <- df[with(df, order(mass_per)),]
data <- data.frame(R = df$flight_case, 
         A = df$mass_per,
         B = df$sex_c)
source("src/compare_models.R")
model_comparisonsAIC("src/generic multinomial models- multinom 1RF + 2 FF.R")
##        [,1]      [,2]      
## AICs   587.5607  593.4209  
## models 3         4         
## probs  0.9492355 0.05068255
## 
## m3   multinom(formula = R ~ A + B, data = data, trace = FALSE)
## m4   multinom(formula = R ~ A * B, data = data, trace = FALSE)
anova(m3, m4, test="Chisq") # Adding A*B does not improve fit
## Likelihood ratio tests of Multinomial Models
## 
## Response: R
##   Model Resid. df Resid. Dev   Test    Df  LR stat.   Pr(Chi)
## 1 A + B       825   569.5607                                 
## 2 A * B       822   569.4209 1 vs 2     3 0.1398496 0.9866598
delta_mass_model <- multinom(flight_case ~ mass_per + sex_c, data = df)
## # weights:  16 (9 variable)
## initial  value 385.389832 
## iter  10 value 286.869825
## iter  20 value 284.809036
## iter  30 value 284.797822
## final  value 284.780360 
## converged
model_table = calculate_P2(delta_mass_model, "mass_per", "sex_c")
## 
##  AIC:  587.5607 
##    (Intercept) mass_per  sex_c DF   SE1   SE2     z1      z2 ML wald sex wald
## -1      -1.015    0.043 -0.692  9 0.010 0.203  4.390  -3.408  19.272   11.617
## 1       -6.820   -0.009 -5.626  9 0.026 0.183 -0.348 -30.721   0.121  943.764
## 2        0.124    0.019 -0.902  9 0.008 0.159  2.334  -5.684   5.447   32.310
##    P1 > |z| P2 > |z|
## -1    0.000    0.001
## 1     0.728    0.000
## 2     0.020    0.000
prediction_equations2(model_table, " Mass Percent Change", " Sex ")
## [1] "Where F = 1"
## [1] "log(pi_-1 / pi_1) = 5.81 + 0.05 Mass Percent Change + 4.93 Sex      Flew in T1, rather than T2"    
## [2] "log(pi_2 / pi_-1) = 1.14 + -0.02 Mass Percent Change + -0.21 Sex      Flew in both, rather than T1"
## [3] "log(pi_2 / pi_1) = 6.94 + 0.03 Mass Percent Change + 4.72 Sex      Flew in both, rather than T2"

Significant models

run_multinom_model = function(d) {
  m <- multinom(flight_case ~ mass_per + sex_c, trace=FALSE, data = d) 
  model_table = calculate_P2(m, "mass_diff", "sex_c", print_table=FALSE)
  return(model_table)
}
get_significant_models(11)

get_significant_models(12)

N Model P
278 Flew in T1 only rather than T2 only = 5.81 + 0.05 Mass % + 4.93 Sex mass** | sex**
278 Flew in both rather than T1 only = 1.14 - 0.02 Mass % - 0.21 Sex mass** | not sex
278 Flew in both rather than T2 only = 6.94 + 0.03 Mass % + 4.72 Sex not mass | sex**

Estimated Odds

For males:

For bugs of mass_perc x + 20%*(0.05), the estimated odds that bug flew in T1 rather than T2 equals

exp(0.05*20)
## [1] 2.718282

times the estimated odds at mass_perc x (%). So for every 20% increase in a bug’s original mass, the bug was 2.72 times more likely to fly only in T1. That makes sense because the odds of flying in T2 only are very low to begin with.

For bugs of mass_perc x + 20*(-0.02%), the estimated odds that the bug flew twice instead of only in T1 equals

exp(20*-0.02) # 20 times less likely to fly twice if gain mass
## [1] 0.67032
exp(-20*-0.02) # 20 times more likely to fly twice if loose mass
## [1] 1.491825

times the estimated odds at mass_perc x (%). Similar message - for every 20% increase in a bug’s original mass, the bug was 1.5 times more likely to fly only in T1. So, when a bug starts loosing mass it becomes more likely to fly twice then only once.

For females:

For bugs of mass_perc x + 20%*(0.05), the estimated odds that bug flew in T1 rather than T2 equals

exp(0.05*20 + 4.93)
## [1] 376.1545

times the estimated odds at mass_perc x (%). So for every 20% increase in a bug’s original mass, the bug was 376 times more likely to fly only in T1. Crazy! It doesn’t take much for a female to only fly once.

For bugs of mass_perc x + 20%*(0.05), the estimated odds that bug flew in both trials rather than T1 equals

exp(-0.02*20 - 0.21)
## [1] 0.5433509
exp(-0.02*-40 - 0.21)
## [1] 1.803988

times the estimated odds at mass_perc x (%). So half as likely to fly in both trials if gain mass. Would need to loose a lot of mass (40) to be almost twice as likely (1.8) to fly in both trials.

exp(coef(delta_mass_model)*20) # this compares to no fly, the baseline # gain large % mass
##     (Intercept)  mass_per        sex_c
## -1 1.513930e-09 2.3406655 9.798741e-07
## 1  5.730838e-60 0.8325593 1.367423e-49
## 2  1.185738e+01 1.4736071 1.475858e-08
exp(coef(delta_mass_model)*5) # gain small % mass
##     (Intercept)  mass_per        sex_c
## -1 6.237728e-03 1.2369007 3.146245e-02
## 1  1.547229e-15 0.9552209 6.081010e-13
## 2  1.855655e+00 1.1017814 1.102202e-02
exp(coef(delta_mass_model)*-5) # loose small % mass
##     (Intercept)  mass_per        sex_c
## -1 1.603148e+02 0.8084723 3.178392e+01
## 1  6.463168e+14 1.0468783 1.644464e+12
## 2  5.388934e-01 0.9076211 9.072748e+01
exp(coef(delta_mass_model)*-20) # loose large % mass
##     (Intercept)  mass_per        sex_c
## -1 6.605326e+08 0.4272289 1.020539e+06
## 1  1.744946e+59 1.2011156 7.313027e+48
## 2  8.433568e-02 0.6786069 6.775721e+07
# 1/0 mass not sig but sex**
  • If F and gain a lot of mass then most likely to fly only in T1 rather than none.
  • If F and loose mass then most likely to not fly in either trial.

Predicted Probabilities

head(pp <- fitted(delta_mass_model))
##           0         -1            1         2
## 1 0.7917303 0.03003973 4.362037e-06 0.1782256
## 2 0.7894639 0.03073036 4.325625e-06 0.1798015
## 3 0.7844677 0.03228066 4.247094e-06 0.1832474
## 4 0.7823713 0.03294258 4.214827e-06 0.1846819
## 5 0.7601763 0.04036342 3.895616e-06 0.1994564
## 6 0.7574981 0.04130977 3.859619e-06 0.2011882
df$index = 1:nrow(df)
females = df %>%
  filter(sex=="F")
males = df %>%
  filter(sex=="M")
Frows = females$index
Mrows = males$index

No females flew in T2 only.

Adding Host Plant

df <- df[with(df, order(mass_diff)),]
data <- data.frame(R = df$flight_case, 
         A = df$mass_diff,
         B = df$sex_c,
         C = df$host_c)
source("src/compare_models.R")
model_comparisonsAIC("src/generic multinomial models- multinom 1RF + 3 FF.R")
##        [,1]      [,2]      [,3]       [,4]       [,5]     
## AICs   589.7091  591.9755  593.0435   593.7062   593.9087 
## models 4         12        13         8          7        
## probs  0.5142205 0.1655767 0.09707198 0.06969088 0.0629815
## 
## m4   multinom(formula = R ~ A + B, data = data, trace = FALSE)
## m12  multinom(formula = R ~ A * C + B, data = data, trace = FALSE)
## m13  multinom(formula = R ~ B * C + A, data = data, trace = FALSE)
## m8   multinom(formula = R ~ A * B, data = data, trace = FALSE)
## m7   multinom(formula = R ~ A + B + C, data = data, trace = FALSE)
anova(m4, m7, test="Chisq") # Adding C (host plant) does not improve fit
## Likelihood ratio tests of Multinomial Models
## 
## Response: R
##       Model Resid. df Resid. Dev   Test    Df LR stat.   Pr(Chi)
## 1     A + B       825   571.7091                                
## 2 A + B + C       822   569.9087 1 vs 2     3 1.800378 0.6148527

Host plant is not significant.

Wing2Body

df <- df[with(df, order(mass_per)),]
data <- data.frame(R = df$flight_case, 
         A = df$mass_per,
         B = df$sex_c,
         C = df$wing2body)
source("src/compare_models.R")
model_comparisonsAIC("src/generic multinomial models- multinom 1RF + 3 FF.R")
##        [,1]      [,2]      [,3]      
## AICs   582.3093  585.1022  587.1333  
## models 7         12        13        
## probs  0.6639449 0.1643044 0.05951306
## 
## m7   multinom(formula = R ~ A + B + C, data = data, trace = FALSE)
## m12  multinom(formula = R ~ A * C + B, data = data, trace = FALSE)
## m13  multinom(formula = R ~ B * C + A, data = data, trace = FALSE)
anova(m7, m12, test="Chisq") # adding A*C does not improve fit
## Likelihood ratio tests of Multinomial Models
## 
## Response: R
##       Model Resid. df Resid. Dev   Test    Df LR stat.   Pr(Chi)
## 1 A + B + C       822   558.3093                                
## 2 A * C + B       819   555.1022 1 vs 2     3 3.207044 0.3607914
anova(m7, m13, test="Chisq") # Adding B*C does not improve fit
## Likelihood ratio tests of Multinomial Models
## 
## Response: R
##       Model Resid. df Resid. Dev   Test    Df LR stat.   Pr(Chi)
## 1 A + B + C       822   558.3093                                
## 2 B * C + A       819   557.1333 1 vs 2     3 1.175993 0.7587677
calculate_P3 = function(m, print_table=TRUE) {
  
  # compute p-values using Wald tests (i.e. z-tests)
  s <- summary(m) 
  z <- s$coefficients/s$standard.errors 
  wald <- z^2
  p <- (1 - pnorm(abs(z), 0, 1)) * 2
    
  # summary table (similar to lm, glm, or lmer function)
  model_table <- cbind(s$coefficients, c(s$edf, s$edf, s$edf), s$standard.errors[,2:4], z[,2:4], wald[,2:4], p[,2:4])
  colnames(model_table) <- c("(Intercept)", "mass %", "sex","wing2body","DF", 
                              "SE1", "SE2", "SE3", "z1", "z2", "z3",
                              "wald2","wald1", "wald3", "P1>|z|", "P2>|z|", "P3>|z|")
  if (print_table){
    cat("\n", "AIC: ", s$AIC, "\n") 
    print(round(model_table,3))    
  }
  return(model_table)
}
model <- multinom(flight_case ~ mass_per + sex_c + wing2body, data = df)
## # weights:  20 (12 variable)
## initial  value 385.389832 
## iter  10 value 291.096706
## iter  20 value 280.669331
## iter  30 value 279.898964
## iter  40 value 279.532350
## iter  50 value 279.155239
## final  value 279.154632 
## converged
model_table = calculate_P3(model)
## 
##  AIC:  582.3093 
##    (Intercept) mass %    sex wing2body DF   SE1   SE2    SE3     z1     z2
## -1     -17.862  0.041 -0.571    23.558 12 0.010 0.212 12.064  4.256 -2.697
## 1       -4.395 -0.005 -9.580    -8.937 12 0.025 6.669 18.679 -0.191 -1.436
## 2      -19.931  0.018 -0.760    28.019 12 0.008 0.166  9.729  2.145 -4.587
##        z3  wald2  wald1 wald3 P1>|z| P2>|z| P3>|z|
## -1  1.953 18.110  7.273 3.813  0.000  0.007  0.051
## 1  -0.478  0.036  2.063 0.229  0.849  0.151  0.632
## 2   2.880  4.600 21.039 8.293  0.032  0.000  0.004

Signal of flying in only T2 is weak, which is probably why the equation in the model was not significant. It does not differ much from not flying at all.

Significant equations

run_multinom_model = function(d) {
  m <- multinom(flight_case ~ mass_per + sex_c + wing2body, trace=FALSE, data = d)
  model_table = calculate_P3(m, print_table=FALSE)
  return(model_table)
}
get_significant_models(15) # mass%

get_significant_models(16) # sex

get_significant_models(17) # wing2body

Predicted Probabilities

head(pp <- fitted(model))
##           0         -1            1         2
## 1 0.7474703 0.03578853 1.245735e-09 0.2167412
## 2 0.8118293 0.02849196 1.532139e-09 0.1596788
## 3 0.6989358 0.04308981 1.080435e-09 0.2579744
## 4 0.8015113 0.03105505 1.486385e-09 0.1674336
## 5 0.7665014 0.04013877 1.345112e-09 0.1933599
## 6 0.7891792 0.03740734 1.448068e-09 0.1734135

Amazing to see what a clear impact wing2body ratio has for whether bugs are more likely to fly twice or not fly at all. Those with wing2body ratios above the mean wing2body ratio appear to be more likely to fly twice and less likely to not fly. This is true across females and males.

Females only

Baseline

df <- d %>%
  filter(!is.na(egg_diff), !is.na(mass_diff), !is.na(flew_diff), sex_c == 1)

df <- df[with(df, order(mass_diff)),]

n_tfemales = nrow(df)

df$flight_case <- relevel(as.factor(df$flight_case), ref = "0")

Barplot

Egg Case

Delta Egg Response Key
Event Encoding
laid eggs in both trials 2
laid eggs in T2 only 1
laid eggs in neither trials 0
laid eggs in T1 only -1

Baseline

df <- d %>%
  filter(!is.na(egg_case), !is.na(mass_diff), !is.na(flew_diff), sex_c == 1)

df <- df[with(df, order(mass_diff)),]

n_tfemales = nrow(df)

df$flight_case <- relevel(as.factor(df$flight_case), ref = "0")

Null model

df$flight_case = droplevels(df$flight_case) # no female bug only flew in T2
null <- multinom(flight_case ~ 1, data = df) 
## # weights:  6 (2 variable)
## initial  value 102.170943 
## final  value 93.055466 
## converged

Multinom model: flew_diff ~ egg_case

model <- multinom(flight_case ~ egg_case, data = df) 
## # weights:  9 (4 variable)
## initial  value 102.170943 
## final  value 84.114906 
## converged
model_table = calculate_P(model)
## Warning in cbind(s$coefficients, c(s$edf, s$edf, s$edf), s$standard.errors[, :
## number of rows of result is not a multiple of vector length (arg 2)
## 
##  AIC:  176.2298 
##    (Intercept)  Estimate DF Std. Err.      z   wald P > |z|
## -1      -0.194    -0.656  4     0.345 -1.902  3.616   0.057
## 2        0.697    -1.189  4     0.314 -3.782 14.306   0.000
anova(null, model, test="Chisq") # adding egg_case improves fit
## Likelihood ratio tests of Multinomial Models
## 
## Response: flight_case
##      Model Resid. df Resid. Dev   Test    Df LR stat.      Pr(Chi)
## 1        1       184   186.1109                                   
## 2 egg_case       182   168.2298 1 vs 2     2 17.88112 0.0001309677

Significant models

get_significant_modelsf(7) #-1/0 egg case not sig | 0/-1 or 2/-1 egg case not sig | -1/2 egg case not sig | but 0/2 and 2/0 egg case**
## Warning in cbind(s$coefficients, c(s$edf, s$edf, s$edf), s$standard.errors[, :
## number of rows of result is not a multiple of vector length (arg 2)

## Warning in cbind(s$coefficients, c(s$edf, s$edf, s$edf), s$standard.errors[, :
## number of rows of result is not a multiple of vector length (arg 2)

## Warning in cbind(s$coefficients, c(s$edf, s$edf, s$edf), s$standard.errors[, :
## number of rows of result is not a multiple of vector length (arg 2)

Comparing models: adding host

Adding host with egg_diff and mass_diff

Host Plant Key
Host Encoding
Golden Rain Tree (GRT) 1
Balloon Vine (BV) -1
data <- data.frame(R = df$flight_case, 
         A = df$egg_case,
         B = df$mass_diff,
         C = df$host_c)
source("src/compare_models.R")
model_comparisonsAIC("src/generic multinomial models- multinom 1RF + 3 FF.R")
##        [,1]      [,2]      [,3]      [,4]       [,5]      
## AICs   165.1826  166.6029  167.0497  167.9211   167.9513  
## models 7         13        4         11         16        
## probs  0.3572256 0.1756085 0.1404487 0.09084268 0.08948043
## 
## m7   multinom(formula = R ~ A + B + C, data = data, trace = FALSE)
## m13  multinom(formula = R ~ B * C + A, data = data, trace = FALSE)
## m4   multinom(formula = R ~ A + B, data = data, trace = FALSE)
## m11  multinom(formula = R ~ A * B + C, data = data, trace = FALSE)
## m16  multinom(formula = R ~ B * C + A * B, data = data, trace = FALSE)
anova(m4, m7, test="Chisq") # Adding C does not improve fit
## Likelihood ratio tests of Multinomial Models
## 
## Response: R
##       Model Resid. df Resid. Dev   Test    Df LR stat.    Pr(Chi)
## 1     A + B       180   155.0497                                 
## 2 A + B + C       178   149.1826 1 vs 2     2  5.86705 0.05320915
anova(m7, m13, test="Chisq") # Adding  mass_diff*host does not improve fit
## Likelihood ratio tests of Multinomial Models
## 
## Response: R
##       Model Resid. df Resid. Dev   Test    Df LR stat.   Pr(Chi)
## 1 A + B + C       178   149.1826                                
## 2 B * C + A       176   146.6029 1 vs 2     2 2.579779 0.2753012
model <- multinom(flight_case ~ mass_diff + egg_case, data = df) 
## # weights:  12 (6 variable)
## initial  value 102.170943 
## iter  10 value 81.763896
## iter  20 value 77.915648
## iter  30 value 77.575673
## iter  40 value 77.528423
## iter  50 value 77.524883
## final  value 77.524846 
## converged
model_table = calculate_P2(model, "mass_diff", "egg_case")
## Warning in cbind(s$coefficients, c(s$edf, s$edf, s$edf), s$standard.errors[, :
## number of rows of result is not a multiple of vector length (arg 2)
## 
##  AIC:  167.0497 
##    (Intercept) mass_diff egg_case DF    SE1   SE2    z1     z2 ML wald sex wald
## -1      -0.879    57.434   -0.534  6 17.992 0.384 3.192 -1.391  10.190    1.935
## 2        0.527    18.699   -1.089  6 14.416 0.296 1.297 -3.674   1.682   13.500
##    P1 > |z| P2 > |z|
## -1    0.001    0.164
## 2     0.195    0.000

Significant eqs

run_multinom_model = function(d) {
  m <- multinom(flight_case ~ mass_diff + egg_case, trace=FALSE, data = d) 
  model_table = calculate_P2(m, "mass_diff", "egg_case", print_table=FALSE)
  return(model_table)
}
get_significant_modelsf(11) # mass diff
## Warning in cbind(s$coefficients, c(s$edf, s$edf, s$edf), s$standard.errors[, :
## number of rows of result is not a multiple of vector length (arg 2)

## Warning in cbind(s$coefficients, c(s$edf, s$edf, s$edf), s$standard.errors[, :
## number of rows of result is not a multiple of vector length (arg 2)

## Warning in cbind(s$coefficients, c(s$edf, s$edf, s$edf), s$standard.errors[, :
## number of rows of result is not a multiple of vector length (arg 2)

get_significant_modelsf(12) # egg_case
## Warning in cbind(s$coefficients, c(s$edf, s$edf, s$edf), s$standard.errors[, :
## number of rows of result is not a multiple of vector length (arg 2)

## Warning in cbind(s$coefficients, c(s$edf, s$edf, s$edf), s$standard.errors[, :
## number of rows of result is not a multiple of vector length (arg 2)

## Warning in cbind(s$coefficients, c(s$edf, s$edf, s$edf), s$standard.errors[, :
## number of rows of result is not a multiple of vector length (arg 2)

tb = model_table
# -1 / 2 | Flew in T1, not T2
I = (tb[1,1] - tb[2,1])
M = (tb[1,2] - tb[2,2])
E = (tb[1,3] - tb[2,3])
EQ1 = paste0("log(pi_-1 / pi_2) = ", round(I, 2), " + ", round(M,2), " Mass Diff", " + ", round(E, 2), " Egg Case", "     Flew in T1, rather than both" )
EQ1  # mass_dff** | sex not
## [1] "log(pi_-1 / pi_2) = -1.41 + 38.73 Mass Diff + 0.56 Egg Case     Flew in T1, rather than both"

Predicted Probabilities

head(pp <- fitted(model))
##           0          -1          2
## 1 0.9145007 0.009854788 0.07564450
## 2 0.2856071 0.021532897 0.69286000
## 3 0.7647877 0.021005983 0.21420635
## 4 0.7482798 0.025860598 0.22585957
## 5 0.8863560 0.020152630 0.09349132
## 6 0.8810560 0.022470525 0.09647351

  • Eggs laid twice: most likely to not fly unless gain more than about 0.025 g mass and then fly only in T1 (largest mass change)
  • Eggs laid in T2: most likely to not fly (2nd widest mass change)
  • Eggs laid only in T1: most likely to fly twice (mostly only lost mass)
  • Eggs not laid: most likely to fly twice always (smallest mass change and mostly only gained about 0.030 g)

wing2body

Baseline

df <- d %>%
  filter(!is.na(egg_case), !is.na(mass_diff), !is.na(flew_diff), !is.na(wing2body), sex_c == 1)

df <- df[with(df, order(mass_diff)),]

n_tfemales = nrow(df)

df$flight_case <- relevel(as.factor(df$flight_case), ref = "0")
data <- data.frame(R = df$flight_case, 
         A = df$egg_case,
         B = df$mass_diff,
         C = df$wing2body)
source("src/compare_models.R")
model_comparisonsAIC("src/generic multinomial models- multinom 1RF + 3 FF.R") # strange error...
##        [,1]      [,2]      [,3]      [,4]      [,5]       [,6]      
## AICs   165.9791  167.0497  168.793   169.6042  169.9652   169.9733  
## models 7         4         12        11        13         8         
## probs  0.4170701 0.2441983 0.1021376 0.0680834 0.05683876 0.05661042
## 
## m7   multinom(formula = R ~ A + B + C, data = data, trace = FALSE)
## m4   multinom(formula = R ~ A + B, data = data, trace = FALSE)
## m12  multinom(formula = R ~ A * C + B, data = data, trace = FALSE)
## m11  multinom(formula = R ~ A * B + C, data = data, trace = FALSE)
## m13  multinom(formula = R ~ B * C + A, data = data, trace = FALSE)
## m8   multinom(formula = R ~ A * B, data = data, trace = FALSE)
anova(m4, m7, test="Chisq") # adding wing2body does not include fit
## Likelihood ratio tests of Multinomial Models
## 
## Response: R
##       Model Resid. df Resid. Dev   Test    Df LR stat.    Pr(Chi)
## 1     A + B       180   155.0497                                 
## 2 A + B + C       178   149.9791 1 vs 2     2 5.070548 0.07924002
anova(m7, m12, test="Chisq") # Adding A*C does not improve fit
## Likelihood ratio tests of Multinomial Models
## 
## Response: R
##       Model Resid. df Resid. Dev   Test    Df LR stat.   Pr(Chi)
## 1 A + B + C       178   149.9791                                
## 2 A * C + B       176   148.7930 1 vs 2     2 1.186134 0.5526299
anova(m7, m11, test="Chisq") 
## Likelihood ratio tests of Multinomial Models
## 
## Response: R
##       Model Resid. df Resid. Dev   Test    Df  LR stat.   Pr(Chi)
## 1 A + B + C       178   149.9791                                 
## 2 A * B + C       176   149.6042 1 vs 2     2 0.3749583 0.8290464
anova(m7, m13, test="Chisq") 
## Likelihood ratio tests of Multinomial Models
## 
## Response: R
##       Model Resid. df Resid. Dev   Test    Df   LR stat.   Pr(Chi)
## 1 A + B + C       178   149.9791                                  
## 2 B * C + A       176   149.9652 1 vs 2     2 0.01392852 0.9930599
anova(m4, m8, test="Chisq") # Adding A*B does not improve fit
## Likelihood ratio tests of Multinomial Models
## 
## Response: R
##   Model Resid. df Resid. Dev   Test    Df LR stat.   Pr(Chi)
## 1 A + B       180   155.0497                                
## 2 A * B       178   153.9733 1 vs 2     2 1.076425 0.5837908

wing2body is not in the top model for females

Encodings

Delta Flight Response Key
Event Encoding
flew in both trials 2
flew in T2 only 1
flew in neither trials 0
flew in T1 only -1
Delta Mass Key
Event Sign
gained mass from T1 to T2 pos
no mass change between trails 0
lost mass from T1 to T2 neg
Host Plant Key
Host Encoding
Golden Rain Tree (GRT) 1
Balloon Vine (BV) -1

Females Only

Delta Flight Response Key
Event Encoding
flew in both trials 2
flew in neither trial 0
flew in T1 only -1
Delta Egg Response Key
Event Encoding
laid eggs in both trials 2
laid eggs in T2 only 1
laid eggs in neither trials 0
laid eggs in T1 only -1

Summary

flight case ~ mass diff

flight case ~ mass per

flight case ~ mass per and sex

wing2body

females only

flight case ~ mass diff and egg case

Questions

  • Egg case and egg diff not factors. Not sure how to work around this problem.
  • Animate plots?
  • For Fall season some bugs were tested more than 2 times (up to four times), but which to pull from and from which experiments?